-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed map of subquery to subquery index in favor of storing index as part of DISI wrapper to improve hybrid query latencies by 20% #711
Merged
martin-gaievski
merged 3 commits into
opensearch-project:main
from
martin-gaievski:save_subquery_index_in_disiwrapper_instead_computing_maptoindex
Apr 27, 2024
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
martin-gaievski
added
backport 2.x
Label will add auto workflow to backport PR to 2.x branch
v2.14.0
hybrid search
hybrid query performance optimization
labels
Apr 26, 2024
…s part of disi wrapper to improve hybrid query latencies by 20% Signed-off-by: Martin Gaievski <[email protected]>
martin-gaievski
force-pushed
the
save_subquery_index_in_disiwrapper_instead_computing_maptoindex
branch
from
April 26, 2024 20:28
5463803
to
2c63f8a
Compare
martin-gaievski
requested review from
heemin32,
navneet1v,
VijayanB,
vamshin,
jmazanec15,
naveentatikonda,
junqiu-lei,
sean-zheng-amazon,
model-collapse,
zane-neo,
ylwu-amzn,
jngz-es,
vibrantvarun and
zhichao-aws
as code owners
April 26, 2024 21:04
VijayanB
reviewed
Apr 26, 2024
src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Martin Gaievski <[email protected]>
VijayanB
reviewed
Apr 26, 2024
src/main/java/org/opensearch/neuralsearch/search/HybridDisiWrapper.java
Outdated
Show resolved
Hide resolved
navneet1v
reviewed
Apr 26, 2024
src/main/java/org/opensearch/neuralsearch/query/HybridQueryScorer.java
Outdated
Show resolved
Hide resolved
navneet1v
approved these changes
Apr 26, 2024
Signed-off-by: Martin Gaievski <[email protected]>
martin-gaievski
force-pushed
the
save_subquery_index_in_disiwrapper_instead_computing_maptoindex
branch
from
April 26, 2024 23:21
a1bb0ce
to
cc89989
Compare
VijayanB
approved these changes
Apr 27, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
opensearch-trigger-bot bot
pushed a commit
that referenced
this pull request
Apr 27, 2024
…s part of DISI wrapper to improve hybrid query latencies by 20% (#711) * Removed map of subquery to subquery index in favor of storing index as part of disi wrapper to improve hybrid query latencies by 20% Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit a3bdde5)
martin-gaievski
added a commit
that referenced
this pull request
Apr 29, 2024
…s part of DISI wrapper to improve hybrid query latencies by 20% (#711) (#712) * Removed map of subquery to subquery index in favor of storing index as part of disi wrapper to improve hybrid query latencies by 20% Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit a3bdde5) Co-authored-by: Martin Gaievski <[email protected]>
opensearch-trigger-bot bot
pushed a commit
that referenced
this pull request
Apr 29, 2024
…s part of DISI wrapper to improve hybrid query latencies by 20% (#711) * Removed map of subquery to subquery index in favor of storing index as part of disi wrapper to improve hybrid query latencies by 20% Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit a3bdde5)
martin-gaievski
added a commit
that referenced
this pull request
Apr 29, 2024
…s part of DISI wrapper to improve hybrid query latencies by 20% (#711) (#715) * Removed map of subquery to subquery index in favor of storing index as part of disi wrapper to improve hybrid query latencies by 20% Signed-off-by: Martin Gaievski <[email protected]> (cherry picked from commit a3bdde5) Co-authored-by: Martin Gaievski <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport 2.x
Label will add auto workflow to backport PR to 2.x branch
backport 2.13
hybrid query performance optimization
hybrid search
v2.14.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this PR we're continue to improve latency of hybrid query as part of meta issue #705.
Based on following flamegraph next area that may give high boost is a lookup of sub-query index by the query. That is needed to get index of sub-query and store score of that sub-query for one document (code ref).
Most time (~28%) is taken by store and lookup of the index based on query as a key. Depending on the exact sub-query calculation of its hash code can be slow (hybrid query works with any type of OpenSearch query). That is a problem on large datasets as this is done for each doc by each sub-query.
We can avoid creation and usage of that query to index map by storing sub-query index at time we create collection of DISIWrapers.
As per benchmark results that gives about 20% performance boost. I've run it on 2.13 using noaa OSB workload, all times are in ms:
Before the change (baseline)
After the change:
Issues Resolved
#705
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.